The global convergence properties of an adaptive QP-free method without a penalty function or a filter for minimax optimization

In this paper, we proposed an adaptive QP-free method without a penalty function or a filter for minimax optimization. In each iteration, solved two linear systems of equations constructed from Lagrange multipliers and KKT-conditioned NCP functions. Based on the work set, the computational scale is further reduced. Instead of the filter structure, we adopt a nonmonotonic equilibrium mechanism with an adaptive parameter adjusted according to the result of each iteration. Feasibility of the algorithm are given, and the convergence under some assumptions is demonstrated. Numerical results and practical application are reported at the end.

The minimax problem is a specific class of nonsmooth optimization problems. The general optimization methods cannot be applied directly in (1) because the objective function is nondifferentiable. The common approaches for (1) are gradient sampling methods [1,2], Cutting Plane Method, and bundle method [3].subgradient methods [4].
Smoothing is one of the most popular classes among all methods for solving nonsmooth problems. There are two main approaches proposed by previous scholars to deal with this problem. First, approximating the non-differential function by a smooth exponential function with parameters (which is also called entropy function). Shor [4] proposed two smoothing algorithms with an active set strategy and a new adaptive parameter update rule. Second, introduce an artificial additional variable to transform the problem into an equivalent nonlinear . where f i ðxÞ : R n ! R are Lipschitz continuously function; t 2 R. For (2), there are many algorithms can be used such as gradient projection [5], interior point method [6], trust-region [7], sequential quadratic programming(SQP) [8], penalty function [9], filter methods [10] or QP-free method [11], etc.
The extraordinary efficiency of SQP methods in solving nonlinear constrained optimization problems (NLP) has allowed its extension to many other problems, such as minimax problems [8][9][10][11][12][13]. But the sequences may fail to converge as the initial point lies far from the optimal point in the SQP algorithm. So penalty function methods were proposed by Courant [13] in 1943 to enhance the convergence of the algorithm. The objective function is defined as the sum of the objective function and penalty term in the penalty function method. In [9], Ma gives an exact smooth penalty function method to solve minimax problems with mixed constraints. However, the choice of penalty parameters during the iterative process is complicated. Meantime, the effectiveness of the penalty function method is significantly affected by it.
Fletcher [14] proposed the filter algorithm which can effectively avoid the choice of penalty parameters. It is inspired by the idea of multi-objective programming in 2002, where the objective function and the constraint violation function are considered separately. The combination of the filter and SQP methods has been applied to the minimax problem due to its satisfactory numerical results. [15] gave a trust-region SQP filtering method combining nonmonotonic techniques to solve the unconstrained minimization problem. Luo [10] constructed a new feasible sub-problem based on working sets and incorporated filtering techniques. Although the filter method has good numerical performance, the update of the filter set also faces the problem that the set is getting larger and the computational storage is growing. On the other hand, the feasibility restoration phase is difficult to avoid in the filter method, which more or less increases the computational effort.
To overcome the possible inconsistency in solving the sub-problem and the high computational cost, Panier [16] proposed a QP-free algorithm (SSLE algorithm) for optimization problems with inequality constraints based on the KKT conditions and Newton's method. Each iteration requires solving two systems of linear equations with the same coefficient matrix and a least-squares subproblem. Global and superlinear convergence are established without the assumption of stationary point isolate. In [11], Jian and Ma presented a new QP-free algorithm for minimax problems according to the unique structure of these problems. [17,18] proposed two QP-free algorithms for solving constrained optimization problems respectively. Inspired by the above study, a nonmonotonic QP-free algorithm without a penalty function or a filter is given in this paper for the minimax problem. And the global convergence, as well as the superlinear convergence under some mild conditions, is proved. This algorithm combines the NCP function to solve in each iteration two nonlinear systems of equations with the same coefficient matrix, which can be viewed as a Newton-quasi-interaction perturbation of the primal and dyadic variables of the KKT condition. An adjustable operator is introduced, which changes in each iteration according to the results of the previous iteration, thus changing the degree of influence of the objective function in this mechanism. A nonmonotonic mechanism is used to avoid the Maratos effect. The working set is introduced to reduce the computational effort further.
The paper consists of the following parts. Section 1 introduces the previous methods for solving minimax problem. In section 2, the structure of the work is described. Section 3 discusses the implementation of the algorithm. Section 4 discusses the global convergence and superlinear convergence rate of the algorithm. In section 5, numerical results and practical application are given. The article has been summarized in the final.
ð � X; � mÞ is called the KKT point for (3) if the following conditions hold, To construct the system of equations, we introduce the nonlinear complementarity problem (NCP) function, and φ(a, b) is called an NCP function if the following relationship holds, The NCP function is Lipschitz continuous and differentiable except for the origin. Strong semi-smoothness holds at (0, 0). The Fischer-Burmeister NCP function is a simple NCP function with good theoretical properties and numerical performance.
The Fischer-Burmeister NCP function used in this paper has the following structure: φ FB ða; bÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi a 2 þ b 2 p À a À b: So we have rφ FB ða; bÞ ¼ a ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi a 2 þ b 2 p À 1; b ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ðz À 1; n À 1Þ j z 2 þ n 2 ¼ 1; ða; bÞ ¼ ð0; 0Þ: Then the NCP function F i is defined by According to (5), define Clearly, KKT condition (5) holds equivalent to F(X, μ) = 0. where e i = (0, � � �, 0, 1, 0, � � �, 0) T , and If (h i (X), μ i ) = (0, 0), introduce the following notations: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ; else: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ; else: It holds that Define the coefficient matrix V k according to (8) where B k is a approximation of the Hessian matrix of L(X, μ), and where c > 0 and τ > 1 are given parameters. In this work, to increase the convergence and flexibility of the algorithm, we substitute the objective function with the following equation: We introduce such an adjustable operator, which is not a penalty parameter, but is changed in each iteration based on the effect of the iteration results. Fig 1 shows the initial situation of taking δ = 1.
We give the following filter equivalence mechanism: or both YðX kþ1 Þ � YðX k Þ À a k y 2 kF k I k: ð12Þ There are three regions in the first quadrant. If the trial point X k is located in region I at the kth iteration, i.e., (10) is satisfied at the current trial point, but (11) and (12) are not satisfied. Then this point is accepted. If the trial point X k is located in reject region, i.e., the function value and constraint violation are not yielded a satisfactory decrease. So the point is rejected. If the iteration point lies in region II, i.e., (10) is not hold, but (11) and (12) are satisfied. It means that the objective function is improved, but the constraint violation function does not reach the sufficient descent condition, so we need to tighten up our acceptance region (see Fig  2).
So we adjust the parameter δ k as follows, where � d is a constant. If X k is located in region III, which means the algorithm makes a good improvement in the objective function and the constraint violation function, So we intend to relax the acceptance criteria and expect further improvements. This means increasing the value of δ k to make the rejected region narrower. (see Fig 3). Adjust δ k as follows.
Algorithm A Based on the above analysis, we give algorithm A for the problem (1).

Remark 2.1
The Eqs (10) and (12) are composed of Lagrange multipliers and the KKT condition. The solution of the system satisfies the first-order optimality condition of the original problem.
Remark 2.2 For convenience, if (10) holds, then the iterative step is called is called F-step; if (11) and (12) hold, it is called Θ-step. Remark 2.3 Obviously, δ k is bounded. And the Lagrangian function is Lipschitz continuous.

Implementation of algorithm
Assumption A1 B k is positive definite and there exists positive numbers m 1 and m 2 such that for all d 2 R n and all k. Assumption A2 {X j ω(X) � ω(X 0 )} and kμ k + λ k k are bounded as k is sufficiently large. Assumption A3 oðXÞ : R nþ1 ! R and hðXÞ : R nþ1 � O ! R are Lipschitz continuously differentiable. For all a; b 2 R nþm , krOðaÞ À rOðbÞk � m 0 ka À bk; kFðaÞ À FðbÞk � m 0 ka À bk; where m 0 > 0 is the Lipschitz constant.

Assumption A4 The Mangasarian-Fromovitz qualification condition (MFCQ) is satisfied at X
Assumption A5 The sequence of {B k } satisfies kðB k À r 2 X LðX k ; m k ÞÞd k1 k kd k1 k ! 0: Assumption A6 The strict complementarity condition holds at (X*, μ*). Remark 3.1 It follows from A3 that the Lagrangian function (4) is Lipschitz continuous.
The following lemmas show the algorithm is well defined. Lemma 3. 1 Step 1 is finitely terminated. proof Assume that the conclusion is not valid, then step 1 will run an infinite number of times.
From the definition of W k , we know that W k,i+1 � W k,i . As i is large enough, Due to F k 6 ¼ 0, there has z k W;j 6 ¼ 0 and n k W;j 6 ¼ 0 for any j by their definitions. v Taking (31) to (29), then pre multiplying (29) by u T , we get where B k is positive definite, and ðrG k Þ T ÞðÀ ðdiagðn k W Þ T Þ À 1 ðdiagðz k W ÞðrG k Þ T is semi-definite.
So we got u = 0. Submitting u = 0 to (31), then v = 0. Since (u, v) = 0 is the unique solution of V k (u T , v T ) T = 0, V k is nonsingular, the conclusion holds.
where B k is an approximation of the Hessian matrix of L(X, μ); rω k is the gradient vector of ω (X k , μ k ). proof From (15), we have Taking (34) to (32), then Hence the conclusion holds. Lemma 3.4 For any 0 < α � 1, there exists t 1 > 0 such that proof If F k I ¼ 0, then from (31), there exists t 1 > 0 such that for any 0 < α � 1, Therefore the conclusion holds for F k where diag(z k ) and diag(ν k ) are the diagonal matrices with diagonal elements z k0 j and ðn k j À cj k Þ. So we have Hence the conclusion holds.
Lemma 3.5 If F k I 6 ¼ 0, for any ε > 0, there exists � a > 0, such that for any 0 < a � � a, For any i 6 ¼ 0, By (36) and (40) Since c k i 6 ¼ 0, by the definitions of c k i and n k i and � k i ¼ 0, the following equation holds By (41) and (42), given any ε > 0, there is � a > 0 such that, for any 0 < a � � a, Hence the conclusion holds. From Lemmas 3.4-3.5, we can obtain the following Lemma 3.6. Lemma 3.6 If F k I 6 ¼ 0; then given any ε > 0, there exists � a > 0 such that, for any 0 < a � � a, Theorem 3.1 If Algorithm A does not terminate at X k , then there exits an α min > 0, such to α k � α min > 0, we have either (10) holds, or both (11) and (12) hold.
If r k ¼ r k 1 , by the definition of ρ k , À g 1 ðd k0 Þ T ro k � h k max , kF k I k � yh k max . From Lemma 6, define � d > 1 g 3 À y 2 , for all sufficiently large k, we have and The proof is complete.

Convergence of algorithm
In this section, the global and superlinear convergence rates of this algorithm have been discussed.
satisfies the KKT condition. In the following, without loss of generality, it is assumed that the algorithm does not terminate finitely. which illustrates F I (X k , μ k ) ! 0. case 2. If (11) and (43) hold for all k sufficiently large, we prove the conclusion by contradiction. Then there exists � 1 > 0, such that and By (12), (43) and the definition of θ and � d, we have Because {ω k } is monotonically decreasing. Then ω k ! −1 as k ! +1 which is contradictory to the hypothesis. Case 3. If the F-step and Θ-step iterative appear alternately. According to Remark 2.2, the iterations from k t to k t+1 and k t+2 + 1 to k t+3 are Θ-steps. And the iteration from k t+1 + 1 to k t+2 iteration are F-steps.
h k max is updated only when the transition from the Θ-step to the F-step, which indicates that the constraint violation is still large enough to decrease. From step 4, we know if h kþ1 max is updated, then Thus h k max ðXÞ is monotonous descent and kF k I k � kF kþ1 is monotonous descent for all α � 0, and So h k tþ1 þ1 max ! 0; t ! 1. Considering the non-negativity of h k max , we have h k max ! 0; k ! 1. Along with (19), kF k I k � maxfy 1 ; � ygh k max . Therefore lim The proof is similar to Lemma 5.4 in [20]. With Lemma 4.1 and 4.2, we can obtain the global convergence of the algorithm. Theorem 4.1 The accumlation point (X*, μ*) of the sequence {(X k , μ k )} is the KKT point pair of problem (15).
The convergence rate of the algorithm is discussed next. To make the algorithm converge superlinear, assumptions 5-6 are added.
Then we discuss the application of algorithms to investment portfolios. In the investment problem proposed by Markowitz, there are two objective functions to be considered. One is to maximize the return of the portfolio, and the other is to reduce the risk. In the traditional model, the latter is to minimize the risk (variance) of a set of feasible portfolios for a given level of expected returns. By varying the expected return level as the two objectives for a set of nondominant portfolios, the efficient frontier on returns is determined by the variance and average of the yields in the Markowitz model. Investors can get a suitable portfolio by analyzing the expected investment and return.
In the traditional Markowitz mean-variance model, it is assumed that the investor has some wealth and is ready to invest in a set of securities, which is recorded as a set P. R k represents the return value of each security k, which is a random variable. The mean value of R k can be calculated from historical data. Define the expected return of a security as μ k = E(R k ), k = 1, � � �, P. x k is the proportion allocated to a certain security. The weight vector x k needs to satisfy the following constraints, The expected returns of the portfolio are as follows: The variance of the portfolio is where covariance matrix Q = {Q i,j } is a symmetric positive semi-definite matrix with uncertain information. Assume that the short-term investment value is uncertain, but there are obviously x k � 0, k = 1, � � �, n. we let B k and A k be the upper and lower bounds for x k , that is, According to the above definition, the base/mean-variance dual-objective optimization problem [22] can be described as where TðxÞ ¼ kxk 0 ¼ P P k¼1 signðjx k jÞ, and sign ¼ À 1; x k < 0; 0; x k ¼ 0; 1; x k > 0:

> < > :
Note that sign(kx k k) is discontinuous, so we introduce the following approximation function which is locally lipschitz continuity, where δ > 0. For any x k 2 R, lim d!0 þ oðx k ; dÞ ¼ signðkx k kÞ. Let y = (x, y P+1 ) T 2 R P+1 , we get the continuous approximation problem as follows: min y2R Pþ1 wðyÞ ¼ ðw 1 ðx 1 ; dÞ; � � � ; w P ðx P ; dÞ; y Pþ1 Þ T s:t: À m T x x T Qx ; e T x ¼ 1; where w(y) = (ω 1 (x 1 , δ), � � �, ω(x p , δ), y P+1 ) T . (48) can be regarded as a minimax problem, which can be solved using the algorithm 1. We take the case of P = 2, the optimal portfolio obtained is [0.5;0.5]. This result illustrates that under the condition that the risk and return of each stock are equal, the proportion of each stock in the optimal strategy is determined by the investor's investment willingness.

Conclusion
In this paper, we give the properties of global convergence and the global convergence of an adaptive QP-free method for solving the minimization problem. Combining the NCP function and the multiplier, in each iteration two systems of linear equations with the same coefficient matrix are solved, which can be viewed as a perturbation of the primal Variables and dual variables of the KKT condition by the Newton-quasi interaction. A new filter substitution mechanism is given, which retains the advantages of the filter method, avoids the selection of penalty parameters, and eliminates potential storage problems that may arise from the filter. And the objective function is tuned by introducing a flexible operator. A non-monotonic mechanism is used to avoid the Maratos effect to some extent, and the introduction of working set further reduces the workload. The effectiveness and convergence of the intensity algorithm are demonstrated under the assumption of no stability point isolation.